About

Row

Original Tweet Count

2,996,979

Final Tweet Count

511,675

Date Selected

July 1st, 2021

Row

Methodology

Our project aimed to characterize the public opinion of the COVID-19 pandemic by applying machine learning on COVID-related tweets. Our methodology is detailed below:

  1. We queried the pre-curated dataset of COVID-related tweets published by Chen et al. in JMIR for those tweets posted on July 1st, 2021. A total of 2,996,979 tweets were identified.

  2. We filtered this initial dataset for tweets which were written in the English language, and which were not retweets (i.e., were original content). The resulting 540,642 tweets were hydrated in Python using the Twitter API.

  3. The 511,675 successfully hydrated tweets were parsed from JSON/HTML and cleaned in R, followed by feature extraction (e.g., hashtags, URLs, replies, retweets, location, etc.).

  4. Finally, we used natural language processing tools such as structural topic modeling to derive aggregate features from our dataset.

Analyses were performed in Python and R. All code is available via our GitHub repository.

Row

Retweeted Tweets

Favorited Tweets

Topics

Topic 3: mask, wear, still, social, distanc


Topic 3 pertains to masking and social distancing. Representative tweets are displayed here. Please note that tweets have not been filtered for objectionable content, and presentation here does not imply endorsement.

Topic 5: vaccin, avail, dose, appoint, sign


Topic 5 pertains to vaccination; specifically, vaccine scheduling and availability. Representative tweets are displayed here. Please note that tweets have not been filtered for objectionable content, and presentation here does not imply endorsement.

Topic 9: trump, covid, american, vote, biden


Topic 9 contains tweets discussing politics and COVID-19. Representative tweets are displayed here. Please note that tweets have not been filtered for objectionable content, and presentation here does not imply endorsement.

Topic 14: doctor, thank, pandem, doctorsday, covid


Topic 14 contains tweets expressing gratitude to doctors and frontline healthcare workers. Representative tweets are displayed here. Please note that tweets have not been filtered for objectionable content, and presentation here does not imply endorsement.

Topic 34: variant, peopl, vaccin, delta, covid


Topic 34 pertains to the Delta COVID-19 variant. Representative tweets are displayed here. Please note that tweets have not been filtered for objectionable content, and presentation here does not imply endorsement.